feat(http): per-turn cost_usd in /v1/chat/completions usage#68
Merged
Conversation
Captures the chat-completions HTTP face (#65) and model-routing matrix integration (#64). New `amplifier-agent serve chat-completions` exposes amplifier-agent as an OpenAI-compatible HTTP service for embedding in third-party tools (opencode, custom UIs). New `amplifier-agent auth` subcommand persists provider credentials to ~/.amplifier-agent/credentials.json so users can configure once and have every invocation pick them up. Wire protocol unchanged at 0.3.0; no wrapper bump required. TypeScript wrapper stays at 0.7.0, Python wrapper stays at 0.3.0. See CHANGELOG.md for full details. Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Provider modules already compute cost_usd per turn and emit it on the JSON-RPC NDJSON wire via hook_streaming (amplifier_agent_lib/bundle/hook_streaming.py). The chat-completions HTTP face was throwing this away — extracting only token counts from kernel usage events. Now it lifts cost_usd through too, accumulating across sub-turns (a single user turn can drive multiple LLM calls for tool-call rounds) and emitting the total in the OpenAI usage envelope as the non-standard `cost_usd` field. Real $$ from the provider's own pricing, surfaced on the SSE response. Implementation -------------- - `_event_translator.extract_usage`: widened return type from `dict[str, int]` to `dict[str, Any]`; reads `event['cost']` (set by hook_streaming from kernel `cost_usd`) and stamps it on the result as `cost_usd: str(...)`. - `_wire._build_usage_block / stop_chunk / tool_calls_stop_chunk`: new `cost_usd: str | None = None` parameter; surfaced on the usage block when set. - `routes/chat_completions`: accumulates `usage_cost: Decimal | None` across all usage events in the turn (preserves precision), serializes to str on emission, passes through to the terminal chunk helpers. cost_usd is a string (Decimal precision) and is omitted entirely when no provider emitted it (older provider modules, third-party endpoints without cost telemetry, etc.) — standard OpenAI clients ignore the non-standard field. Verified end-to-end against the running server: a one-word reply with claude-haiku-4-5 produces `cost_usd: '0.0118625'` in the terminal chunk's usage block. This is a wire translation — it leverages cost telemetry that already flows on the NDJSON wire. No new pricing catalog or provider-side plumbing required. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
9344217 to
bccb5f3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Surfaces per-turn dollar cost on the chat-completions HTTP face by lifting the
cost_usdvalue that provider modules already emit on the NDJSON wire (viaamplifier_agent_lib/bundle/hook_streaming.py) into the OpenAI usage envelope.Pure wire translation — no new pricing data, no provider-side changes, no parallel catalog surface. The cost telemetry already flows on the JSON-RPC wire; this PR just plumbs it through to the OpenAI-shape wire that opencode and other OpenAI-compatible clients see.
What the wire looks like now
cost_usdis a string to preserve Decimal precision. Standard OpenAI clients ignore the non-standard field; cost-aware clients can render the real dollar value.Implementation
_event_translator.pyextract_usagewidens return type todict[str, Any]; readsevent['cost'](set byhook_streamingfrom kernelcost_usd) and stampscost_usd: str(...)on the result_wire.py_build_usage_block,stop_chunk,tool_calls_stop_chunkaccept `cost_usd: strroutes/chat_completions.pycost_usdis omitted from the response entirely when no provider emitted it (older provider modules, third-party endpoints without cost telemetry).Verified end-to-end
Real $$ from Anthropic's pricing, accumulated and surfaced on the OpenAI wire.
Compatibility
cost_usdfield.cost_usd(anthropic, openai, and chat-completions providers all do today).Companion PRs
— closed (catalog approach was redundant with existing NDJSON flow)microsoft/amplifier-module-provider-anthropic#59microsoft/amplifier-app-opencode#2— surfaces per-modellimitin the opencode config (was always available in /v1/models; was stripped for paranoia)